Bilingual alignment of anaphoric expressions
نویسندگان
چکیده
In this paper we present an automatic mechanism for bilingual (Spanish-English) alignment of anaphoric expressions. For this purpose, two anaphora resolution systems were used. Both are based on linguistic preferences and constraints, for Spanish (SUPPAR) and for English (MARS). These systems have been independently developed and each of them is presented individually with their evaluation results. The majority of the paper presents an automatic alignment method for pronominal anaphora in Spanish and English. Once an anaphor has been solved (in both languages) this method matches anaphoric expressions and antecedents from both texts. A bitext map method has been used for the alignment with a set of bilingual texts for the evaluation. These texts have been extracted from several European Community Official documents (EUR-lex database). The alignment mechanism can be applied to different tasks related to Machine Translation such us pattern learning for translation or evaluation for automatic generation of multilingual anaphora.
منابع مشابه
Annotation of Anaphoric Expressions in an Aligned Bilingual Corpus
This paper discusses a French-English corpus annotated and aligned at anaphoric level. It also presents an annotation scheme based on the study of a detailed corpus featuring different types of correspondences and mismatches. The scheme which is adapted from EAGLES recommendations, supports the alignment at anaphoric level and caters for the different kinds of mismatches.
متن کاملJohan Segura and Violaine Prince Using Alignment to detect associated multiword expressions in bilingual corpora
Translating multiword expressions from a language to another needs to recognize them as such. Bilingual multiword expressions are an issue when they are not the exact word-toword translation of each other. The following examples are provided for a French-English translation task: (1) Phrasal verbs such as « to call in on » becoming « rendre visite », (2) « sorry to hear that », that a human tra...
متن کاملWhat kind of problems do protein interactions raise for anaphora resolution? - A preliminary analysis
In this preliminary study, we analyzed the kind of anaphoric expressions that occur in expressions describing protein interactions found in biological text. We also studied the impact of anaphora resolution on protein interaction extraction, when an off-the-shelf anaphoric resolver (i.e., not one specially developed for this domain) is used, and looking at full texts as well as abstracts. Our r...
متن کاملA Hybrid Word Alignment Approach to Improve Translation Lexicons with Compound Words and Idiomatic Expressions
In this paper, we present a hybrid approach to align single words, compound words and idiomatic expressions from bilingual parallel corpora. The objective is to develop, improve and maintain automatically translation lexicons. This approach combines linguistic and statistical information in order to improve word alignment results. The linguistic improvements taken into account refer to the use ...
متن کاملBilingual Terminology Extraction Using Multi-level Termhood
Purpose: Terminology is the set of technical words or expressions used in specific contexts, which denotes the core concept in a formal discipline and is usually applied in the fields of machine translation, information retrieval, information extraction and text categorization, etc. Bilingual terminology extraction plays an important role in the application of bilingual dictionary compilation, ...
متن کامل